Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling
نویسندگان
چکیده
Most of the existing approaches for RGB-D indoor scene labeling employ hand-crafted features for each modality independently and combine them in a heuristic manner. There has been some attempt on directly learning features from raw RGB-D data, but the performance is not satisfactory. In this paper, we adapt the unsupervised feature learning technique for RGB-D labeling as a multi-modality learning problem. Our learning framework performs feature learning and feature encoding simultaneously which significantly boosts the performance. By stacking basic learning structure, higher-level features are derived and combined with lower-level features for better representing RGB-D data. Experimental results on the benchmark NYU depth dataset show that our method achieves competitive performance, compared with state-of-theart.
منابع مشابه
Correlated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition
In this paper, we propose a correlated and individual multi-modal deep learning (CIMDL) method for RGB-D object recognition. Unlike most conventional RGB-D object recognition methods which extract features from the RGB and depth channels individually, our CIMDL jointly learns feature representations from raw RGB-D data with a pair of deep neural networks, so that the sharable and modalspecific ...
متن کاملCombining Models from Multiple Sources for RGB-D Scene Recognition
Depth can complement RGB with useful cues about object volumes and scene layout. However, RGB-D image datasets are still too small for directly training deep convolutional neural networks (CNNs), in contrast to the massive monomodal RGB datasets. Previous works in RGB-D recognition typically combine two separate networks for RGB and depth data, pretrained with a large RGB dataset and then fine ...
متن کاملOn the Applicability of Unsupervised Feature Learning for Object Recognition in RGB-D Data
We present a feature extraction method for RGB-D data based on k-means clustering that builds on recent work by Coates et al. Using unsupervised learning methods we are able to automatically learn feature responses that combine all available information (color and depth) into one, concise representation. We show that depth information can substantially increase the recognition performance and t...
متن کاملUnsupervised Feature Learning for RGB-D Image Classification
Motivated by the success of Deep Neural Networks in computer vision, we propose a deep Regularized Reconstruction Independent Component Analysis network (RICA) for RGB-D image classification. In each layer of this network, we include a RICA as the basic building block to determine the relationship between the gray-scale and depth images corresponding to the same object or scene. Implementing co...
متن کاملCross-modal Sound Mapping Using Deep Learning
We present a method for automatic feature extraction and cross-modal mapping using deep learning. Our system uses stacked autoencoders to learn a layered feature representation of the data. Feature vectors from two (or more) different domains are mapped to each other, effectively creating a cross-modal mapping. Our system can either run fully unsupervised, or it can use high-level labeling to f...
متن کامل